The purpose of this workflow is to provide a simple demonstration of the reason why models like species distribution models fail to predict change in ecosystems when alternate ecosystem states (AESs) exist. The switch from one state to another is not a function of climate alone, but a function of feedback regimes that change through spatially contagious ecological processes and depend on initial conditions. The simulations show that even if the fit between environmental and response variables lead to accurate prediction in one time period, this does not necessarily mean that environmental conditions will be good predictors of the response variable at another time step.
The general workflow serves as a set of hypotheses against which we will evaluate the predictive power of species distribution models fit to the environment vegetation relationship of real data (Upper Midwest USA [UMW] stem density and community composition) at one time period and predicting another time period. This particular iteration shows one extreme: environmental conditions and the response variable are completely independent and only appear to be related to one another. We know our empirical system is not exactly like this. In the UMW, the distribution of savanna and forest ecosystem states is driven by a combination of environmental gradients and feedbacks maintaining distinct states in similar environmental conditions. Therefore, this hypothetical simulation should be interpreted as one extreme.
I simulate a 1D vector of initial ecosystem states. The ecosystem state is binary, representing distinct alternate states. I refer to the alternate states as savanna and forest, but they can be any AESs. I define the initial conditions such that across 1D space, savanna is adjacent to savanna and forest is adjacent to forest. They meet in the middle, comparable to an ecotone.
Now that I have the initial ecosystem state, I create the initial continuous response variable. The continuous response variable is a proxy for ecosystem state and is bimodal, with modes indicating different AESs. In this case, I refer to this variable as stem density, with low stem density indicative of the savanna ecosystem state and high stem density indicative of the forest ecosystem state. I define the distributions of stem density values based on observations of stem density in the savanna and forest ecosystems of the UMW. In this simulation, I also define a gradient in forest stem density, assuming that forests further from the savanna will have higher stem density
Now, I simulate environmental variables that would explain the distribution of ecosystem states/stem density at the first time step. At this first time step, I’ve chosen to simulate savanna in left cells and forest in right cells. Therefore, I will make a couple variables that follow left-to-right gradients. The first few (x1-x3) variables show a smooth left-to-right gradient with various amounts of random noise added. The last two variables (x4 and x5) are made to show a jump between the savanna and forest states (low values in one state and high values in the other) instead of a smooth gradient. These variables demonstrate that environmental conditions that could support either ecosystem state can show similar patterns in one time step. These types of variables are often chosen to explain the distribution of ecosystems/vegetation across space and used to make predictions of how ecosystems will change with climate change. In reality, the environmental conditions could support either ecosystem state.
Next, I evolve the system over time. Independent and response variables are evolved completely independently of each other. The change in independent variables does not affect the change in the response variable.
For the response variable, this involves simultaneously evolving the ecosystem state and total stem density, both dependent on the previous time step and surrounding grid cells. First, I evolve ecosystem state. The ecosystem state at each time step after the first (i.e., 2:T) depends on the ecosystem state at the same location and at adjacent locations in the previous time step. If the previous state was savanna (0) and it is surrounded by savanna, then the state at the current time step will be savanna (0). Same with forest. If the previous state was savanna (0), but the savanna was adjacent to forest (1), then the state at the current time step has an equal probability of being savanna (0) or forest (1).
Then, I evolve stem density. The stem density is dependent on both the stem density at the previous time step and the ecosystem state. If the ecosystem state remains the same from the previous to the current time step, then the stem density is the same with some random variation added. If the ecosystem state shifts from one state to the other, then the stem density is adjusted accordingly by redrawing randomly from the distribution of stem densities appropriate for the new ecosystem state. So, if a location shifts from savanna to forest, for instance, the stem density is randomly drawn from \(\sim N(250,75)\).
This is done for 150 time steps, with 150 being chosen arbitrarily. The process produces change in ecosystem state over time across our 1D space. Code is shown below for clarity of the process.
# Set seed again
set.seed(8)
# Loop over each row (time step) after initial conditions
for(i in 2:ntime){
# Loop over each column (location)
for(j in 1:nloc){
# Take adjacent locations from the previous time step,
# accounting for edges at j = 1 and j = nloc
if(j == 1) prev <- sim_binary[i-1,c(j,j+1)]
if(!(j %in% c(1, nloc))) prev <- sim_binary[i-1,c(j-1,j,j+1)]
if(j == nloc) prev <- sim_binary[i-1,c(j-1,j)]
## Determine current and new states
if(length(unique(prev)) == 1){
# If all adjacent cells are 0, then new step is 0
if(unique(prev) == 0) curr <- 0
# If all adjacent cells are 1, then new step is 1
if(unique(prev) == 1) curr <- 1
}
# If adjacent cells are a mix, then the new step is a 50/50 chance of
# staying the same state vs switching
if(length(unique(prev)) > 1) curr <- sample(x = c(0, 1), size = 1)
# Previous stem density
prev_density <- sim_cont[i-1,j]
# If the state is the same, then randomly vary density
if(sim_binary[i-1,j] == curr) curr_density <- rnorm(n = 1,
mean = prev_density,
sd = 5)
# Adjustments to ensure we maintain bimodality
if(curr == 0 & curr_density > 47) curr_density <- 47
if(curr == 0 & curr_density < 1) curr_density <- 1
if(curr == 1 & curr_density < 150) curr_density <- 150
# If there was a state shift, create a new stem density
if(sim_binary[i-1,j] != curr){
# If the state is savanna, draw from savanna distribution
if(curr == 0) curr_density <- runif(n = 1, min = 1, max = 47)
# If the state is forest, draw from forest distribution
if(curr == 1) curr_density <- rnorm(n = 1, mean = 250, sd = 75)
}
# Update the matrices
sim_binary[i,j] <- curr
sim_cont[i,j] <- curr_density
}
}
This provides us with both binary ecosystem state and continuous stem density that is dependent both on the previous state of the system and surrounding locations, representing spatially dependent feedbacks and disturbance regimes.
With this process evolution, the initial timestep has 50% forest grid cells and the final timestep has 64% forest grid cells, representing a loss of 14% of total savanna area in the hypothetical simulation.
At the same time, the independent variables are independently evolved over time. The independent variables in our case represent the abiotic environment experienced by these ecosystems, and the abiotic conditions that are typically used to explain spatiotemporal patterns in ecosystem state. Some of these variables change over time, sometimes directionally, sometimes not. Some of these variables do not change over time. This is reflected in the evolution process shown below.
# Loop over each row (time step) after initial conditions
for(i in 2:nrow(x1_mat)){
# Loop over each column (location)
for(j in 1:ncol(x1_mat)){
# Increasing pattern for variables 1 and 4
# Very small changes or else you get really large change over 150 time steps
# (temporal change far exceeds spatial change)
x1_mat[i,j] <- x1_mat[i-1,j] + rnorm(n = 1,
mean = 0.003,
sd = 0.01)
x4_mat[i,j] <- x4_mat[i-1,j] + rnorm(n = 1,
mean = 0.003,
sd = 0.01)
# Random change for variables 2 and 3
x2_mat[i,j] <- x2_mat[i-1,j] + rnorm(n = 1,
mean = 0,
sd = 0.02)
x3_mat[i,j] <- x3_mat[i-1,j] + rnorm(n = 1,
mean = 0,
sd = 0.02)
# No change in variable 5
x5_mat[i,j] <- x5_mat[i-1,j]
}
}
I want to check the independent variables for a few things. First, I make the same type of plots to make sure the spatiotemporal gradients look like what we expect to see. Next, I compare the environmental variables between the first and last time step to make sure that the change over time is not so dramatic as to be unrealistic.
Now that I have independent and response variables over time, I want to fit a model describing the relationship between the environment and stem density at the first time step. Then, I will use this relationship to try to predict stem density in the final time step, similar to how we often fit space-for-time substitution models. I compare the prediction accuracy from out-of-sample locations in the first time step to the same locations in the last time step.
##
## Call:
## glm(formula = density ~ var1 + var2 + var3 + var4 + var5, family = gaussian(link = "log"),
## data = simulation_in1)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.26251 0.31479 10.364 2.53e-15 ***
## var1 0.80918 0.15567 5.198 2.25e-06 ***
## var2 -0.12688 0.20278 -0.626 0.53372
## var3 0.38404 0.11309 3.396 0.00118 **
## var4 0.70886 0.14533 4.878 7.44e-06 ***
## var5 -0.15555 0.07521 -2.068 0.04267 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1786.907)
##
## Null deviance: 937045 on 69 degrees of freedom
## Residual deviance: 114366 on 64 degrees of freedom
## AIC: 730.56
##
## Number of Fisher Scoring iterations: 8
## Sample size: 70
## Number of trees: 1000
## Forest terminal node size: 1
## Average no. of terminal nodes: 44
## No. of variables tried at each split: 3
## Total no. of variables: 5
## Resampling used to grow trees: swor
## Resample size used to grow trees: 44
## Analysis: RF-R
## Family: regr
## Splitting rule: mse *random*
## Number of random split points: 10
## (OOB) R squared: 0.87681432
## (OOB) Requested performance error: 1672.90683803
Now, I can make out-of-sample predictions from these models at the first and last time step. The exact same grid cells are predicted in each time step. Overall lower explanatory power from the models fit to these simulations should be expected compared to the models fit to empirical data. This is because there is no true explanatory power in the simulations: environmental conditions are completely independent of stem density. In our accompanying empirical system, some of the spatial gradient of savanna and forest ecosystems is explained by environmental conditions, so explanatory power in the timestep in which the model was fit should be higher.
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
| Model | Initial time step | Final time step |
|---|---|---|
| Generalized linear model | 0.944 | 0.645 |
| Random forest | 0.956 | 0.707 |
| Generalized additive model | 0.941 | 0.683 |